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ABSTRACT 



Adaptive noise filtering is applied to an image frame of HSI 
data to reduce and more uniformly distribute noise while 
preserving image feature edges. An adaptive spatial filter 
includes a plurality of averaging kernels. An appropriate 
kernel is selected for each pixel for each of the hue and 
saturation components. A set of thresholds are defined for 
selecting the kernel for the hue component. Another set of 
thresholds are defined for selecting the kernel for the satu- 
ration component. The kernel for the saturation component 
is selected by comparing the intensity component to the 
saturation component thresholds. The kernel for the hue 
component is selected by comparing the product of intensity 
component and the saturation component to the hue com- 
ponent thresholds. A color gradient operation is applied to 
the filtered HSI data to aid in detecting image object 
boundaries. Object segmentation and other image process- 
ing techniques may be performed on the filtered HSI data. 

32 Claims, 13 Drawing Sheets 
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IMAGE PROCESSING IN HSI COLOR 
SPACE USING ADAPTIVE NOISE 
FILTERING 

CROSS REFERENCE TO RELATED 
APPLICATIONS 

This invention is a continuation of and related to U.S. 
patent application Ser. No. 09/216,692 filed Dec. 18, 1998 
(now U.S. Pat. No. 6,301,387 issued Oct. 9, 2001) of Sun et 
al. for "Template Matching Using Correlative Auto- 
Predictive Search;" U.S. patent application Ser. No. 09/216, 
691 filed Dec. 18, 1998 (now U.S. Pat. No. 6,243,494 issued 
Jun. 5, 2001) of Sun et al. for "Template Matching in Three 
Dimensions Using Correlative Auto-Predictive Search;" 
U.S. patent application Ser. No. 09/233,894 filed Jan. 20, 
1999 (now U.S. Pat. No. 6,272,250 issued Aug. 7, 2001) of 
Sun et al. for "Color Clustering for Scene Change Detection 
and Object Tracking in Video Sequences;" and U.S. patent 
application Ser. No. 09/323,501 filed Jun. 1, 1999 of Sun et 
al. for "Video Object Segmentation Using Active Contour 
Modelling With Global Relaxation." The content of such 
applications are incorporated herein by reference and made 
a part hereof. 

BACKGROUND OF THE INVENTION 

This invention relates to color image processing tech- 
niques such as object tracking and image segmentation, and 
more particularly to a process for filtering HSI data for 
object tracking and image segmentation. 

Color image processing techniques often are used in 
image enhancement, video encoding, video editing and 
computer vision applications. Image tracking relates to the 
identification of an image object each frame in a sequence of 
image frames, such as in a sequence of motion video frames. 
Image segmentation is used to identify boundaries and edges 
of image objects in an image frame. 

HSI refers to the Hue, Saturation, Intensity color model 
for presenting color data. There are many different color 
models (also referred to as color domains or color spaces) 
developed for the representation and manipulation of color 
data. Color monitors typically use a Red, Green, Blue (RGB) 
color model. Color printers typically use a Cyan, Yellow, 
Magenta (CYM) or a Cyan, Yellow, Magenta, Black 
(CYMK) color model. Color television broadcast signals 
typically use a luminance, intensity, color difference (YIQ) 
color model, where I and Q relate to chrominance. 

The Hue Saturation Intensity (HSI) color model closely 
resembles the color sensing properties of human vision. The 
intensity component is related to the luminance component 
decoupled from the color. The hue and saturation compo- 
nents are related to the way in which a human perceives 
color. Such relation to human vision makes it desirable to 
use the HSI color model for color image processing 
techniques, such as image enhancement and image segmen- 
tation. 

The input image data for color image processing tech- 
niques typically is in RGB format. Unfortunately the trans- 
formation from RGB to HSI color space and from HSI to 
RGB color space is very nonlinear and complicated in 
comparison to the conversion formulas among the other 
color models. As an example, when an RGB image is 
degraded by random noise, the nonlinearity in the conver- 
sion formulae causes the noise distribution in HSI color 
space to be nonuniform. Further, the noise distribution in 
HSI color space depends on the intensity and saturation 
values of the input data. For example, when the intensity 
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value is small, the noise in the saturation and hue is large. 
This creates problems in using the HSI color model for 
image processing techniques, such as image enhancement 
and image segmentation. Accordingly, there is a need for a 

5 method which reduces the magnitude of the noise or the 
nonuniformity of the noise variance in HSI color space. 

With regard to object tracking, it is known to use data 
clustering methods, such as found in pattern learning and 
recognition systems based upon adaptive resonance theory 

30 (ART). Adaptive resonance theory, as coined by Grossberg, 
is a system for self-organizing stable pattern recognition 
codes in real-time data in response to arbitrary sequences of 
input patterns. (See "Adaptive Pattern Classification and 
Universal Recoding: II . . . " by Stephen Grossberg, 

is Biological Cybernetics 23, pp. 187-202 (1976)). It is based 
on the problem of discovering, learning and recognizing 
invariant properties of a data set, and is somewhat analogous 
to the human processes of perception and cognition. The 
invariant properties, called recognition codes, emerge in 

20 human perception through an individual's interaction with 
the environment. When these recognition codes emerge 
spontaneously, as in human perception, the process is said to 
be self -organizing. 

With regard to image segmentation, active contour 

25 models, also known as snakes, have been used for adjusting 
image features, in particular image object boundaries. In 
concept, active contour models involve overlaying an elastic 
curve onto an image. The curve (i.e., snake) deforms itself 
from an initial shape to adjust to the image features. An 

30 energy minimizing function is used which adapts the curve 
to image features such as lines and edges. The function is 
guided by external constraint forces and image forces. The 
best fit is achieved by minimizing a total energy computation 
of the curve. The energy computation is derived from (i) 

35 energy terms for internal tension (stretching) and stiffness 
(bending), and (ii) potential terms derived from image 
features (edges; corners). A pressure force also has been 
used to allow closed contours to inflate. Conventionally, 
iterations are applied to get the entire contour to converge to 

40 an optimal path. 

SUMMARY OF THE INVENTION 

According to the invention, adaptive noise filtering is 
applied to an image frame of HSI data to reduce and more 

45 uniformly distribute noise while preserving image feature 
edges. In one implementation for a sequence of image 
frames, such filtering allows for improved image object 
tracking ability and improved image object segmentation. 
According to one aspect of the invention, it has been 

50 found that in transforming an RGB image into HSI color 
space, noise present in the RGB image is nonuniformly 
distributed within the resulting HSI image. In particular the 
hue and saturation components have what may be consid- 
ered to be a Cauchy distribution of noise where mean and 

55 variance do not exist. As a result, a noise distribution model 
has been determined experimentally. 

According to another aspect of this invention, the HSI 
data is filtered using an adaptive spatial filter having a 
plurality of averaging kernels. An appropriate kernel is 

60 selected for each pixel for each of the hue and saturation 
components. A set of thresholds are defined for selecting the 
kernel for the hue component. Another set of thresholds are 
defined for selecting the kernel for the saturation compo- 
nent. 

65 According to another aspect of this invention, the kernel 
for the saturation component is selected by comparing the 
intensity component to the saturation component thresholds. 
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According to another aspect of this invention, the kernel FIG. 14 is a flow chart of a process for performing a fast 

for the hue component is selected by comparing the product search of the search area to identify local matches between 

of intensity component and the saturation component to the a template and a subset of window areas of the search area; 

hue component thresholds. FIG. 15 is a diagram of center data points for windows in 

According to another aspect of this invention, a color 5 the vicinity of the local template match to be tested for a 

gradient operation is applied to the filtered HSI data to aid better match (also shown are center points for nearby 

in detecting image object boundaries. windows tested during the fast search); 

According to another aspect of the invention, a method is FIG. 16 is a diagram of a quadrature modelling filter for 

provided for segmenting an image frame of pixel data, in decomposing an image to achieve detailing images and a 

which the image frame includes a plurality of pixels. For 30 low pass residue; 

each pixel of the image frame, the corresponding pixel data FIG. 17 is a flow chart of an active contour modelling 

is converted into hue, saturation, intensity color space. The process for segmenting an image; 

HSI pixel data then is filtered with the adaptive spatial „ Tj0 ,. * r • * * 

filters. Object segmentation then is performed to define a set / IG ' 18 * a dl »f am ° f a 5 *f P'* el d ° main ab °«' * «inent 

of filtered HSI pixel data corresponding to the image object. " edge pomt (pixel) used for selecting other candidate points 

rr%. c * u * a a - u- u • i j . which might be used in place of the current edge point; 

The image frame then is encoded in which pixel data & r 5 * 9 

corresponding to the image object is encoded at a higher bit FIG - 19 is a diagram of potential edge points processed to 

rate than other pixel data. preserve one optimal path for an image object boundary; and 

An advantage of the invention is that image segmentation FIG * 20 is a partial travel path of the contour in the 

techniques are performed in HSI color space where color 20 process of being derived from the set of points of FIG. 19. 
sensing properties more closely resemble human vision. 
According to another advantage of this invention, object 

boundaries are preserved while noise level is significantly m „ * . . „, . 

reduced and the noise variance is made more uniform. . FIG ' 1 u shows a s y stem 10 to \ ada P l,ve noise ? ttn »* and 

r „ • . , . . , .25 image object segmentation and tracking according to one 

Itiese and other aspects and advantages of the invention ~p7u„ c . . _ * n • i, , 

.« . . „ j * l i- . L c it embodiment ol the invention. System 10 includes a user 

will be better understood by reference to the following jnterface „ an ad jye ^ fi]teri subs 13 a 

alTin draw'in ° D ^ C ° nJUnCtl ° n ^ ^ aCC ° m " subsystem 12 for detecting changes in scene (e.g., a modi- 

panying ra wings. g ec j ac japtive resonance theory — 2 (M-ART2) subsystem), 

BRIEF DESCRIPTION OF THE DRAWINGS 30 an ob J ect tracking subsystem 14 (e.g., a 2D or 3D correlative 

r r * auto-predictive search (CAPS) subsystem), an object seg- 

FIG. 1 is a block diagram of a system for performing mcnmioQ subsystem 1S (e . g ., m cdge energy derivation 

adaptive noise filtenng, video segmentation and object subs tem and an active contour mode ^ mg subsystem), and 

tracking according to an embodiment of this invention; afl encoder subsystem 19 

FIG. 2 is a flow chart of a method for processing a 35 ^ adaptive noise filtering subsystem 13 converts input 

sequence of image frames to perform adaptive noise image frame data from RGB or another input format mt0 

filtering, object tracking and image segmentation according HSI format) then filters the HSI data and applies a colored 

to an embodiment of this invention; gradient to the filtered data. In other embodiments the 

FIGS. 3a-3c are sample images of noise in hue, saturation adaptive noise filtering subsystem 13 need not be combined 

and intensity components, respectively; 4 o with the other subsystems for scene change detection, object 

FIG. 4 is a chart of saturation component noise variance tracking, object segmentation, energy derivation or 

versus intensity; encoding, but may stand alone with the user interface 11, or 

FIG. 5 is a 3D graph of hue component noise variance °e combined with one or more of the same or other sub- 

versus intensity and saturation; systems to form an alternative system for image processing. 

FIG. 6 is a diagram depicting multiple filtering kernels in 45 The M-ART2 subsystem 12 serves to detect scene 
the adaptive spatial filter according to an embodiment of this changes in a sequence of image frames. The CAPS sub- 
invention; system 14 serves to identify an object in a given image 

FIG. 7 is a chart showing sample thresholds for selecting frame - ^ CAPS subsystem also serves to track the object 

a filtering kernel for filtering the saturation component; amon S a «qu*n<* of input image frames. A motion vector 

FIGS. 8a-8c are sample HSI images of an image without 50 of * e lracked ob j ec | is , The edge energy sub- 
noise, an image with noise which has not been filtered and s >; stem ^ l ° ^ ula ! e the edgC energy f °f una f 
an image with noise which has been filtered, where in each object t0 be modeUed - ™ c acUve <™tour modelling sub- 
case a color gradient operation has been applied; s ^ em set ^ s !° T'" 1 ^ I accuratel y 

i-™ « • j- e . model an edge boundary of the image obiect being tracked. 

FIG. 9 ,s a diagram of an input, processing, output J5 when aQ of teles ^J^J^ edil ? g or a . 

sequence for the scene change detection subsystem of FIG. ta ■ nf : „ M f t u a u 7 m 

„ M , , . . 6 J tenng ot a video sequence, the encoder subsystem 19 

1 to obtain image edges; Z , *u « V j -j • . 

A z. encodes/compresses the finalized video sequence into a 

FIG. 10 is a flow chart for a method of pattern learning desired format 

and recognition implemented by the scene change detection nc various subsystems are implemented in software on 

subsystem of FIG. 1, 60 one Qr more | 10St computing devices or are integrated into an 

FIG. 11 is a diagram of a template and search area for embedded system. Preferably the functions of the various 

performing a correlative autopredictive search (CAPS); subsystems are performed by programmed digital computers 

FIG. 12 is a flow chart of a process for determining CAPS of the type which are well known in the art. A host computer 

step sizes according to an implementation of the object system for embodiments of the invention typically includes 

tracking subsystem of FIG. 1; 65 a display monitor, a keyboard, a pointing/clicking device, 

FIG. 13 is a diagram of a search area of data points with one or more processors or multiprocessors, random access 

a window area to be tested against a template; memory (RAM), a non -volatile storage device such as a hard 
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disk drive, and other devices such as a communication or CAPS process is described below in a separate section. If at 

network interface (e.g., modem; ethernet adapter), a trans- step 48 the image object is not found using the CAPS 

portable storage media drive, such as a floppy disk drive, process, then the tracking method 20 terminates or 

CD-ROM drive, zip drive, bernoulli drive or other magnetic, re-initializes for tracking another object. If the object is 

optical or other storage media. The various components 5 identified, then the edge energy for the object boundary is 

interface and exchange data and commands through one or derived at step 50. Then at step 52 an active contour model 

more busses. The computer system receives information by is applied to segment the image boundary and accurately 

entry through the keyboard, pointing/clicking device, a model the object boundary. At the next step, step 38 the 

network interface or another input device or input port. The modelled image boundary is output. As described above for 

computer system may be any of the types well known in the 10 the initial image frame, in some embodiments the output is 

art, such as a mainframe computer, minicomputer, or micro- written to a buffer, a file, and/or to a video screen. The 

computer. To speed up computations, (e.g., convolutions, process then repeats steps 38-52 for another image frame, 

correlations) parallel processing may be implemented. As a result, an image object is segmented and tracked over 

FIG. 2 shows a system flow chart of a method 20 for (i) many image frames. Thereafter, in some embodiments an 

applying an adaptive noise filtering process HSI data and (ii) is encoding process is applied to encode the data into a desired 

tracking and segmenting an image object defined by such format (e.g., MPEG-4 video), 

data according to an embodiment of this invention. Although Adaptive Noise Filtering in HSI Color Space 

tracking and segmentation are described below as being One of the functions of the Filtering Subsystem 13 is to 

performed on the filtered data, the filtering process may be convert the input image data into HSI format. Typically, the 

applied, instead, for an alternative image processing system 20 input image data is in RGB format. In one embodiment the 

in which alternative image processing techniques are imple- following equations are implemented to convert from RGB 

mented. format to HSI format: 

Input to the method at steps 22 and 24 are initial edge 

points and an initial image frame. In one application the l R g\ + (R b)] ® 

initial edge points are selected manually by an operator 25 H _ ^.-i 2 

using a conventional video editing application interface. In [(*- C) 2 + (/? - B)(G - B)] 05 

another application the edge points are derived automati- 3 

cally and fed into a method embodiment of this invention. s = l - R + g ^ g [min(/f, c, B)] 

At steps 26-30 the adaptive noise filtering subsystem 13 

performs the steps of converting the image data into HSI 30 

format (step 26), applying adaptive spatial filtering to the i^r+g+b) (Hi) 
HSI data (step 28) and applying a color gradient to the 

filtered HSI data (step 30). The resulting HSI data then is where R, G and B are the respective RGB components of the 

analyzed at step 32 using the scene change detection sub- input data; 

system 12. In one embodiment, a modified applied reso- 35 min (R,G,B) denotes a function for the minimum of R, G 

nance theory — 2 (M-ART2) process is executed as part of and B; 

step 32 to define clusters of image pixels. The M-ART2 the ranges Sj ^ R> G and B are in [01]j while H is in 

process is described below in a separate section. At step 34, degrees (0 to 360°)* 

the object segmentation subsystem 18 derives the edge Hue-H where B<G 

energy of the input edge boundary is derived. Then at step 40 ' ' 

36 the subsystem 18 applies an active contour model to VT Hue-360-H where B>G 

segment the edge boundary and accurately model the object Nonlineanty of Noise in HSI Conversion: 

boundary. The active contour model is described below in a . F ° r an input unage with data in RGB format, noise occurs 

separate section. At step 38 the modelled image object m the RGB color space. It is assumed that random gaussian 

boundary is output. In some embodiments the output is 45 noise ™ ih ze ™ ™jn and d> variance occurs in the RGB 

written to a buffer, a file, and/or to a display. In various ***** data -. In addltl0n . the D01S * 10 each RGB col <* 

embodiments the RGB to HSI conversion step 26, the component is assumed to be independent from one another 

adaptive spatial filtering step 28 and the color gradient step a ° d al f ^ rom ^ the ima § e data S1 ? na1 ' M shown m ^ 

30 may occur at any step prior to the image segmentation ©-0H). the RGB-to-HSI conversion equations are nonlin- 

steps (i.e., steps 34 and 36). 50 ean For exam P le * the noise ^ancc of intensity (I) is o 2 ^. 

Iterative processing then is performed for subsequent However, the noise ; variances in hue and saturation cannot be 

image frames. In some embodiments each image frame is defined analytically since they have a land of Cauchy 

processed. In other embodiments, image frames are periodi- Attribution, where mean and variance do not exist, 

cally or aperiodically sampled. At step 39 the next image Therefore, the noise characteristics of hue and saturation 

frame to be processed is input to the method implementation 55 have be ! n evahjated experimentally. 

20. At steps 40-42 the adaptive noise filtering subsystem 13 ! n order t0 measure the D0ise variance of hue and satu- 

performs the steps of converting the image data into HSI ratl0n and t0 ^alyze the noise dependency on the image 

format (step 40), applying adaptive spatial filtering to the data ' s f veral MI ?Pk images are created in the HSI color 

HSI data (step 41) and applying a color gradient to the ^ * Id one cmbahment a 256x25 6-pixel sample image is 

filtered HSI data (step 42). The resulting HSI data then is 60 *vided mto 16x16 blocks with each block having 16x16 

analyzed at step 44 using the scene change detection sub- P*f ls * Each block m one sample image has constant HSI 

system 12 to determine whether there has been a change in values , as defined beiow: 

scene. If a scene change is detected at step 44, then the H(ij)»64 for 1^16, l=j^l6 

method 20 is complete, or is re-initialized to track another S(ij)=9+7j for l^i^l6, l^j^l6 

image object. If a scene change has not occurred, then the 65 I(ij)=9+7i for l^i^l6, l^j^l6 

image object is identified from the image frame using a where i and j are block numbers in horizontal and vertical 

correlative auto-predictive search (CAPS) process. The directions, respectively. The sample image has an intensity 
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value increasing horizontally while the saturation value 
increases vertically. The experiment is repeated with several 
different hue values. 

In each experiment the sample image in the HSI color 
space is converted to the RGB color space, and random 
Gaussian noise is added to each RGB color component. The 
noise has a Gaussian distribution with zero mean and a 2 
variance. The image with noise in the RGB color space is 
reconverted to the HSI color space and the noise character- 
istics are analyzed. Noise in the HSI color space is computed 
as follows: 



(IV) 



where, RGB to HSI[ ] corresponds to the conversion Eqs. 

I — III from RGB to HSI; 
(n^ n g7 n fc ) are the noises in RGB color components, 

respectively; and 
(n A , n^, n t ) are the noises in HSI color components, 

respectively. 

FIGS. 3a-3c show the noise distribution of the Hue, 
Saturation and Intensity components respectively for H=64. 
In this example, the noise that is added to the RGB image, 
(n r , n^ n fc ), has a variance of 9. As shown in FIG. 36, the 
noise in the saturation component (n s ) depends on the 
intensity value, (i.e., it is large when the intensity is small at 
the left side of FIG. 3b). The noise in the hue component (n h ) 
depends on the intensity and saturation values, (i.e., it is 30 
large when the intensity and saturation values are small in 
the upper-left corner of FIG. 3a. 

To show the relationship between noise and the image 
data, the variance of noises in saturation and hue is analyzed 
with respect to the intensity and saturation values. In FIG. 4, 35 
the variance of n^ is plotted with respect to the intensity 
value, which is approximately proportional to 1/Intensity 2 . 
The variance of n, also depends on the hue and saturation 
values, but their effects are negligible in comparison with 
that from the intensity value. FIG. 4 plots the mean value of 40 
the variance of n^ with different hue and saturation values. 
FIG. 5 shows the variance of n h with respect to the intensity 
and saturation values. The variance of n h also depends on the 
hue value itself, but this dependency is negligible compared 
with that on the intensity and saturation values. Accordingly, 45 
in applicant's model noise in the saturation component is 
taken to be proportional to the value of the intensity com- 
ponent. Noise in the hue component is taken to be propor- 
tional to the value in the intensity and saturation compo- 
nents. 50 
Adaptive Spatial Filtering: 

At steps 28 and 41 (see FIG. 2) an adaptive spatial 
filtering method is executed to reduce the noise in the image 
data signal. According to the method kernel size of an 
averaging filter is adapted to make the noise distribution in 55 
the HSI color space more uniform while preserving image 
edge information. The kernel size is adapted based on noise 
variance. 

Referring to FIG. 6, a kernel is selected from a set 60 of 
kernels Kl to K4 for each pixel according to the intensity 60 
and saturation values. In one embodiment saturation com- 
ponent threshold values (A,, B,, C s , D s ) for filtering the 
saturation component are defined based on the noise analysis 
results in FIG. 4. For example, the filter kernel Kl is applied 
when the variance on n s is between 3a 2 and 7a 2 . Then the 65 
noise variance after filtering with the Kl kernel is between 
30*15 and IcflS. 
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Similarly, the K2, K3, and K4 kernels are used when the 
variance ranges of n, are [7a 2 , 18a 2 ], [18a 2 , 35a 2 ], and 
[35a 2 , ], respectively. The threshold values of A,, B s , C s , 
and D s are selected from the intensity axis (of FIG. 4) to 
correspond (in the n^ variance axis) to 3a 2 , 7a 2 , 18a 2 , 35a 2 , 
respectively as shown in FIG. 7. 

Hue component threshold values (A A , B h , C b , D h ) for 
filtering the hue component are defined based on the noise 
analysis results in FIG. 5. The hue component threshold 
values (A A , B h , C h , and D h ) are selected from FIG. 5 by 
using 3a 2 , 7a 2 , 18a 2 , and 35a 2 as transition points in the n h 
variance axis. In alternative embodiments the number of 
filter kernels and/or their shapes and coefficient values may 
be varied or increased, in which case the new threshold 
values are determined to make the noise distribution more 
uniform. When the number of filter kernels increases, the 
noise distribution is made more uniform, and the noise 
variance is further reduced for extremely small intensity 
and/or saturation values. 

Once the saturation component threshold values (A^, B s , 
C s , and D,) are established, the saturation component of the 
HSI image is filtered adaptively by the filter kernel selected 
for each pixel based on its intensity value according to 
equation (V) below: 



filter kernel for S(x, y) = 



no filter, for A s < /(*, y) 

Kl, for B s < /(*, y) £ A s 

K2 t for C t < /(*, y) s B, 

K3 t for D s < I(x t y) <; C t 

K4, for /(*, y) & D s 



(V) 



where (x,y) are the horizontal and vertical coordinates of a 
respective image pixel. After the saturation component is 
filtered, the hue component can be filtered in a similar way 
using equation (VI) below. However, the filter kernel for 
each hue pixel is adaptively selected based on the product of 
intensity and saturation values as follows: 



filter kernel for H(x, y) - 



'no filter, for A k < l(x f y)S(x, y) (VI) 

K A for B h < i(x, y)S(x, y) & A h 

K2, for C h <l{x,y)S(x,y)zB h 

K3, for D h <I(x,y)S{x t y)sC h 

K4, for /(*, >>S(jc, y) s D h 



where S(x,y) is the saturation component after filtering using 
Eq. (V). The adaptive spatial filtering improves the satura- 
tion and hue noise characteristics significantly by reducing 
noise level and by making the noise distribution more 
uniform. The smoothing filters reduce the random noise and 
smooth the image details. To avoid blurring the image 
details, in one embodiment an image edge-preserving pro- 
cedure (equation VII) is applied during adaptive filtering as 
follows: 



, , f0, if \t{u,v)-l(x,y)\>2(r 
filter coefficient at (u, v) = < 

\ 1, if |/(«, v) - /(jc, y)\ s 2a- 



(VU) 



where (x,y) is the center pixel of the kernel, (i.e., the pixel 
to be filtered), and (u,v) are other pixels in the filter kernel. 
In equation (VII), a is the standard deviation of noise in the 
RGB color space. If the threshold value in equation (VII) is 
too large, the image edges end up being smoothed by the 
adaptive spatial filtering. It has been found that the threshold 
value of 2a was effective to handle about 90% of noise in the 
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intensity component because the variance of n,- is o 2 ^. In 
various applications, the noise variance, a 2 , is measured or 
estimated in an RGB image. 
Applying Color Gradient: 

A color gradient image of the filtered input frame is 
derived at steps 30 and 42. The color gradient image is 
obtained by applying a derivative of Gaussian (DOG) opera- 
tor to each HSI pixel component in the filtered image. 
Equation (VIII) below characterizes the application of the 
color gradient to the filtered image resulting from step 28 or 
step 41: 

, , l vH(x,y) 2 + VS{x, >) 2 + W(.*,y)~ (VIII) 

where 

VSCj^) 2 "^^^,^))^^)^^^) 2 ; and 
V/^^) 2 =(/(^)*G A (^)) 2 +C^y)*C?^>)) 2 . 

In equation (VIII), G^(x,y) and G v (x,y) are the gradient 
operators in the horizontal and the vertical directions, 
respectively. The symbol * denotes a convolution operation. 

FIG. 8a shows a color gradient output image for a sample 
HSI image in which there is no noise present. FIG. 8b shows 
a color gradient output image for the same sample HSI 
image, but in which there is noise present. The adaptive 
filtering is not performed for the image of FIG. Sb. FIG. 8c 
shows a color gradient output image for the same HSI image 
with the same noise present as in FIG. 86, but where the 
adaptive filtering steps 28 or 41 are performed prior to 
applying the color gradient. As evidenced in FIGS. 86 and 
8c, the noise is definitely reduced in the color gradients with 
the adaptive spatial filtering. 

Pixel Clustering and Scene Change Detection 

In one embodiment the scene change detection subsystem 
12 is based upon a method of modified applied resonance 
theory as described in the commonly-assigned U.S. patent 
application Ser. No. 09/233,894, filed Jan. 20, 1999 for 
"Color Clustering for Scene Change Detection and Object 
Tracking in Video Sequences." The content of such appli- 
cation is incorporated herein by reference and made a part 
hereof. 

The subsystem 12 performs pattern learning and recog- 
nition on a sequence of input image frames. Referring to 
FIG. 9, the subsystem 12 processes a current image frame 60 
grouping the image frame contents into clusters 66. The 
image frame 60 is formed by an array of image pixels P. For 
a raster type image frame, the image pixels are arranged into 
y rows and x columns. In various embodiments the image 
pixels are color image pixels coded according to a standard 
red, green, blue coding scheme (e.g., NTSC), a standard 
yellow, magenta, cyan and black coding scheme (YMCK), 
a standard luminosity, chrominance, brightness coding 
scheme (e.g., YUV), the hue saturation intensity color 
scheme (HSI), or some other color coding scheme. For the 
embodiment for the process of FIG. 2 the conversion of 
RGB data to HSI data occurs prior to the M-ART2 26 steps. 
Accordingly, HSI data is used for such embodiment. In 
various embodiments the RGB to HSI conversion may occur 
at any step prior to the image segmentation steps (i.e., steps 
28 and 30 to generate edge energy 28 and apply active 
contour model 30). 
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Each image frame is a set of data points. Each pixel is a 
data point. A data point is referred to herein as an input 
vector. Input vector corresponds to pixel P (x^yy) which 
for an HSI coding scheme has a value (H,S,I). The sub- 
5 system 12 processes a sequence 68 of input vectors P 
corresponding to a given set of data points (i.e., a current 
image frame 60). The input vectors P are grouped into 
clusters 66. 

Each cluster 66 is a learned or a recognized pattern. For 
10 a first set of input data (i.e., an initial image frame) there is 
no prior information for allocating the data points into 
clusters. Thus, the patterns are learned. For subsequent sets 
of data points (e.g., subsequent images in a sequence of 
image frames), the patterns previously learned may be used. 
15 Specifically, data points for a current set of data points 
(image frame) are tested to try and recognize the prior 
patterns in the new set of data points. The process for 
analyzing the subsequent sets of data points is a recognition 
process. During the recognition process, the previous 
20 learned patterns also are updated and modified based upon 
the new data. 

Pattern Learning and Recognition: 

Referring to FIG. 10, a flow chart of the pattern learning 
and recognizing process (also see steps 32 and 44 of FIG. 2) 

25 commences at step 76. If the current image frame is an initial 
image frame, then at step 78 various parameters are reset. 
Further, if the current image frame is an initial image frame 
then there are no clusters that have been started. 

The current image frame 60 is processed in an iterative 

30 manner (step 80). At step 82, an initial set of prototype 
vectors for this processing iteration of the current image 
frame is obtained. There is a prototype vector for each 
cluster defined. If the current image frame is an initial image 
frame, then there are no prototype vectors. The prototype 

35 vector is a weighted centroid value based upon a history of 
input vectors allocated to the corresponding cluster. 

The process for allocating input vectors into clusters is 
performed for each input vector (step 84). Such process is 
based upon a minimum distance measure. In various 

40 embodiments an euclidean distance, an absolute distance or 
some other distance measure is used. In one embodiment the 
euclidean distance is used. An input vector is allocated to a 
cluster to which it has a minimal euclidean distance with the 
cluster's prototype vector. At step 86, the prototype vector 

45 closest to the input vector is found. As a self-organizing 
control for allocating data into clusters, a vigilance 
parameter, also referred to herein as a vigilance value, is 
used. A vigilance test is performed at step 88. If the 
minimum euclidean distance is not less than the vigilance 

50 value, then a new cluster is defined at step 90. The input 
vector is assigned to such new cluster and becomes the 
initial prototype vector for such new cluster. If the minimum 
euclidean distance is less than the vigilance value, then the 
input vector is assigned to the cluster corresponding to the 

55 closest prototype vector at step 92. Thus, an input vector is 
allocated to a preexisting cluster or a new cluster. 

For a new learning and recognition process, there are no 
prototype vectors to start with. Thus, the first input vector 
will define an initial prototype vector for a first cluster. The 

60 minimum distance between the next input vector and the 
prototype vectors will be to the first prototype vector (since 
at this point in the example there is only one prototype 
vector). If such minimum distance exceeds the vigilance 
value, then the second input vector becomes an initial 

65 prototype vector for a second cluster. If, however, such 
minimum distance is within the vigilance value distance, 
then the second input vector is allocated to the first cluster. 
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If the second input vector is allocated to the first cluster, vector for a new cluster. According to various embodiments, 

then the prototype vector for such first cluster is modified at either the same or a different vigilance value is used during 

step 94. The modified prototype vector for the first cluster the subsequent iterations. 

becomes the weighted centroid value for all data points Upon identifying a cluster into which an input vector is 

among the first cluster, based upon the following equation: 5 allocated during a subsequent iteration, the prototype vector 

(i.e., weighted centroid) for such cluster is recalculated. 

^ MW) _ PQc, y) + H^'||chtffgrf*°|l During the subsequent iteration the number of input vectors 

* " Wduster^w + 1 in the cluster is not reset, but remains at its last count from 

the prior iteration. Thus, the weighting influence of the 
10 current input vector is less during the subsequent iteration 

where, W^^-new prototype vector for cluster k=new than during the prior iteration. 

centroid value; Af ter tne subsequent iteration is complete, like in the prior 

W^-old prototype vector for cluster k=old centroid iteration, any cluster having fewer than a prescribed thresh- 

value; old number of input vector members is discarded (step 96). 

P(x,y)=input vector; 15 The clusters then are tested for convergence (step 98) to see 

|| cluster/^H-number of vectors in cluster k. if the number of input vector members in each cluster has 

The influence of the new input vector in the cluster has a significantly changed. If the number has not changed 

weighted influence on the prototype vector of the cluster. significantly, then the iterative process is complete. In this 

The weight is proportional to the number of input vectors in sense, the process is self-stabilizing. If a cluster was dis- 

the cluster, and thus, corresponds to a statistical centroid. 20 carded for such iteration, such discarded cluster is consid- 

This process for updating the prototype vector provides a ered to be an outlier and the members are considered as 

self-scaling feature to the cluster learning and recognition noise. 

process. The number of cluster members is considered to change 

This process is used for allocating each input vector of the significantly if it has changed by more than a prescribed 

current image frame. Once all the input vectors have been 25 number of data points or prescribed percentage, whichever 

allocated in a given iteration, testing is performed to deter- is larger. Such number and percentage are defined empiri- 

mine whether another iteration is needed and whether outlier cally. If the number of members has changed significantly 

clusters are present. then a new iteration is performed (step 80). In the new 

For an initial data set where no information is previously iteration, the remaining (e.g., non-discarded) prototype vec- 

stored, one or more initial clusters are defined as above. An 30 tors from the immediately prior iteration are used as the 

iterative process is used, however, to achieve a self- initial prototype vectors for each remaining cluster (step 82). 

stabilizing quality to the clusters. Specifically, once the The iterations continue until, either the number of members 

entire data set has been processed, allocating the input in each cluster is not changed significantly (convergence test 

vectors into clusters, another iteration of allocating the input at step 98), or a prescribed maximum number of iterations 

vectors into clusters is performed. Prior to performing 35 has occurred. Such maximum number of iterations is deter- 

another iteration, however, the clusters are analyzed for mined as a matter of design or empirically, 

quantity in an outlier test (see step 96). According to such For a current image frame which is subsequent to an 

test, any cluster having less than a prescribed threshold initial image frame, the prototype vectors correspond to the 

number of input vector members is discarded. More spe- final prototype vectors from the preceding image frame 

cifically the prototype vector is discarded and thus not used 40 processed among the sequence of image frames being pro- 

in finding a minimum distance to input vectors during a cessed. Each input vector in such current image frame is 

subsequent iteration. The input vectors in the discarded allocated to a cluster by determining the prototype vector to 

cluster are considered to be outliers (e.g., noise). which it has a minimum euclidean distance (step 86). If such 

Consider, for example, a data set including 30,000 data minimum distance is less than the vigilance value (step 88), 

values. Also, consider that after the first iteration, a first 45 then the input vector is allocated to the cluster corresponding 

cluster has 20,000 members, a second cluster has 8,000 to that prototype vector (step 92). If such minimum distance 

members, a third cluster has 1985 members, and a fourth exceeds the vigilance value, then the input vector defines a 

cluster has 15 members. In this example, assume the pre- prototype vector for a new cluster (step 90). A new cluster 

scribed threshold value is 64. Because cluster 4 has less than corresponds to a new prototype pattern. According to vari- 

64 input vector members, it is discarded. It is expected that 50 ous embodiments, either the same or a different vigilance 

many of the input vectors in this fourth cluster will be value is used for the subsequent image frames in the 

allocated into another cluster during a subsequent reitera- sequence relative to that used for an initial image frame. In 

lion. Note that this is an example, and that the threshold a preferred embodiment, the vigilance value is increased for 

value may be prescribed as a matter of design, or based upon the subsequent data sets, relative to that for the initial data 

empirical analysis. 55 set. 

For the next iteration the prototype vectors from the Upon identifying a cluster into which an input vector is 

remaining clusters of the prior iteration are retained (step 82 allocated, the prototype vector (i.e., centroid) for such 

of next iteration). In our example above, the prototype cluster is recalculated. The number of input vectors in the 

vectors from the first three clusters are retained, while the cluster is held over from the processing of the prior image 

prototype vector from the fourth cluster is discarded. Each 60 frame. Thus, the prototype vector is a weighted centroid 

input vector then is re-allocated to a cluster during this based upon multiple iterations of multiple image frames in 

subsequent iteration by determining the prototype vector to a sequence of image frames. 

which it has a minimum euclidean distance. If such mini- After all the input vectors of the current data set have been 

mum distance is less than the vigilance value, then the input allocated into clusters, another iteration of allocating the 

vector is allocated to the cluster corresponding to that 65 input vectors into clusters is performed. Prior to performing 

prototype vector. If such minimum distance exceeds the another iteration, however, the clusters are analyzed for 

vigilance value, then the input vector defines a prototype quantity in the outlier test (step 96). Any cluster having less 
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than a prescribed threshold number of input vector members processing of the first input vector, such input vector will 

is discarded as described above for the initial data set. For define a new cluster and become the prototype vector for 

the subsequent iteration the prototype vectors from the such cluster (step 90). Additional cluster then are defined 

remaining clusters of the first iteration are retained. Each based upon whether the current input vector is farther than 

input vector then is re-allocated to a cluster during the 5 the vigilance value distance away from the prototype vector 

subsequent iterations in the same manner as described (s). Note that initially there are no prior input vectors in each 

above. new cluster (cluster count«0 when first deriving the 

Each image frame in the sequence is similarly processed. weighted centroid of a new cluster). 
In a preferred embodiment, the starting prototype vectors for 

allocating input vectors of a current data set are the final 30 Correlative Auto-Predictive Search (CAPS)— 

prototype vectors obtained during processing of the imme- Object Tracking 

diately prior data set. Further the count of the number of A preferred embodiment of the correlative auto-predictive 

input vectors in a clusters is held over from prior iterations pr0 cess is described in the commonly-assigned U.S. 

and prior image frames. New clusters defined as the patent application Ser. No. 09/216,692, filed Dec. 18, 1998 

sequence of data clusters continue correspond to new pro- 15 ( Q0W tj.S. Pat. No. 6,301387 issued on Oct. 9, 2001) for 

totype patterns. New prototype patterns may occur in an "Template Matching Using Correlative Auto-Predictive 

image sequence, for example, due to an image object Search." The content of such application is incorporated 

insertion, deletion or change. herein by reference and made a part hereof. 

Detecting Scene Changes Within a Sequence of Image ^ CAps pf0CCSS ^ execmed fof image frames folk)w _ 

frames. „ 20 ing an initial image frame. The object to be tracked has been 

In the course of processing a sequence of image frames of defined durf processing of the ^ im frame . ^ 

a common scene it is expected that much of the image objec , location ^ dated ^ ^ CAps ^ durf 

content is similar from image frame to image frame As a processing 0 f subsequent image frames. The initial object or 

result, me aennea clusters will De similar trom image rrame the updated 0 y ect from tD6 prior frame 

serves as a template 

to image frame, rhe hold over of the count of input vectors 25 for locatin the object m , he mt ■ &ame Refcrri 

in a cluster used in wanting the centroid of the cluster is [0 FIQ u the objecl bej tracked Mrves M a , ate m 

based upon such assumption If while processing a given whi)e (he current . frame M a seafch afea m 

image frame however, .t is deterrnmed that the prototype ^ hte 108 b ove{Md OQt0 a window m wi(hin (he 

vectors for ^ each one of several clusters have ^changed seafch afea m A motion vec(or ^ maintained wbjch 

beyond a threshold amount, then it is considered that the 30 iden ,ifiesuie change in location of the object from one frame 

scene being imaged has changed^ Specifically, upon pro- {Q the ncx , , n ^ embodiments the motion vector derived 

cessing any given image frame, if more than a prescribed froffl me ious frame ^ ^ tQ ^ a ^ 

number of prototype vectors has changed by more than a ^ 

predetermined amount, then a scene change is considered to * , . , 

have occurred 35 c tem P^ ate data P 0Lnts are compared to the win- 

Ascene change is determined by tracking a cluster change dow 'f U , 2 data . P™* t0 jf the dala f 5 ^ 

ratio from image frame to image frame. Specifically, after f orre a e '° a desir * d de 8 ree ' If ,he y do ' ' a matd ! 

the iterative processing of input vectors for a current image ^mplate has been found, n a search area 110 formed by 'm 

frame is complete, the cluster rate of change for that image ' ows of . n dala f" 1 s ' a ' em P la e k !? ws of P t 

frame is derived. Cluster rale of change is derived in a 40 data pomts may be placed over (m-k+l)*(n-p + l) potential 

preferred embodiment using the following equation: win ows 

To reduce the number of windows 112 that the template 

f 108 is compared with, an effective step size is derived from 

V \N f N f ~ l \ tem P^ ate * According to a 2-dimensional implementation 

fz{ k k 45 embodiment, a step size along a first axis 114 is derived and 

R ~ a ste P s ^ e a l° n 8 a second axis 116 is derived. Rather then 

compare the template to every possible window of the 
search area 110, the template 108 is moved along either or 

where, R^cluster change ratio for image frame f; both of the first axis 114 and second axis 116 by the 

N/=number of input vectors in cluster k of frame f (actual 50 corresponding first axis step size or second axis step size, 

number, not the count used in prototype vector centroid Once the desired step sizes are derived, then the template 

which counts input vector for each iteration); 108 is compared to the various windows 112 of the search 

N /OM/ =total number of input vectors in image frame f; and area 110 at the step size increments during a fast search 

n/«number of clusters in frame f. process. In one embodiment the comparison is a correlation 

Note that if the k-th cluster in frame f is a new cluster, then 55 function of the template 108 and the window 112 and results 

N/* 1 is simply zero. A scene change is identified at step 44 in a correlation coefficient. Any window 112 in which the 

(see FIG. 9) when the cluster change ratio for an image correlation coefficient with the template 108 is found to 

frame f exceeds a prescribed value, (e.g., 5%-10%). The exceed a specific value is a local match for the template. In 

prescribed value is determined empirically or be design and a preferred embodiment the specific value is the cut value 

may exceed the example values of 5%-10%. 60 times a threshold value. 

If a scene change is detected for a current image frame f, Next, a full search then is performed in the vicinity of any 

then the method 20 terminates, or is restarted (at step 22) location which is a local match. A full search of such vicinity 

with the current image frame f set to be an initial frame. encompasses performing a correlation between the template 

Image frame f then is re-processed as the current frame. and every potential search area window between the local 

Since it is an initial frame, parameters are reset at step 78. 65 match location window and the windows at the prior and 

Specifically, the prototype vectors are discarded. Thus at next step in each of the horizontal and vertical axes. For 

step 82 there are no prototype vectors. As a result, during example, if the horizontal step size is 3 pixels and the 
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vertical step size is 4 pixels, then correlations are performed 
for windows ±1 pixel and ±2 pixels along the horizontal axis 
and ±1 pixel, ±2 pixels and ±3 pixels along the vertical axis. 
In addition correlations are performed for windows off the 
axes within the area delineated by the step sizes. Thus, the 5 
full search of the vicinity of the local match for this example 
includes (2*2+l)*(2*3+l)-l=34 correlations between the 
template and the search area. Any locations among the local 
match locations and the locations tested during the full 
search of the vicinity which exceed the threshold value are 1Q 
considered template matches. In some embodiments, only 
the location having the highest correlation is considered a 
match. In other embodiments there may be multiple 
matches. Thus, the top matches or all matches above the 
threshold are selected as resultant matches. 
Determining Step Size: 1S 

To determine effective step sizes, the template 108 itself 
is analyzed. Referring to FIG. 12, at a first step 120 the 
template 108 is padded with additional data points to 
achieve a padded template. For circular padding, multiple 
copies of the template 108 are used to increase the template 20 
size. The number of copies may vary for differing embodi- 
ments. In a preferred embodiment there are at least 9 full 
copies of the template in the circularly padded template. In 
another embodiment, a padded template is achieved by 
linear padding. For linear padding, data points are added in 25 
which each data point has a common value. The common 
value is a padding constant. In one embodiment the padding 
constant may be 0 or another fixed value. In a preferred 
embodiment the padding constant is derived from the data 
values of the various data points which make up the template 30 
108. For example, in one embodiment an average data value 
is derived for all the temple 108 data points using any of 
various averaging techniques. This average value serves as 
the padding constant. For image data, the added data points 
are pixels and the padding constant is a pixel intensity and/or 35 
color. Preferably the center window of the padded template 
formed by linear padding also is formed by the original 
template 108. 

Referring again to FIG. 12, at another step 122 the 
template 108 is correlated to various windows of the padded 40 
template. Because the center of the padded template equals 
the original template 108, it is known that the correlation 
between the template 108 and the center window is 1,0. 
Thus, that correlation need not be calculated. It is already 
known. For a two dimensional analysis, a correlation 45 
between the original template 108 and windows of the 
padded template are derived for windows along either of 
such axes 114, 116 moving in either direction away from the 
center window. The step size for selecting adjacent window 
to evaluate is one data point. Consider for example a 50 
template which is 40 pixels by 60 pixels and a padded 
template which is 120 pixels by 180 pixels. The step size is 
one pixel. Starting from the center window, there are 40 
potential windows in a first direction along the first axis 114 
and 40 potential windows in a second, opposite direction 55 
along the same axis 114. In step 122 a correlation is 
performed between the template and the select windows. As 
the selected window changes along the first axis 114 in the 
first direction, the resulting correlation coefficient is likely to 
decrease below 1.0. Eventually there will be a window 60 
where the correlation coefficient falls to a prescribed cut-off 
value. Such cut-off value may vary for differing 
embodiment, but preferably is less than a threshold value 
which identifies an estimated match between a window and 
the template. A window will be found in the padded template 65 
in each direction along axis 114 where the cut-off criteria is 
met. 



Rather than perform a correlation for each potential 
window along the first axis 114, correlations are performed 
for windows along the axis 114 away from the center 
window in each direction until a window is identified in such 
direction where the correlation coefficient intersects the 
cut-off value. For two dimensional analysis, there is a cut-off 
point found in each direction from the center window along 
the first axis 114. The distance between those two windows 
in data points is the width along the first axis. 

Referring to FIG. 12, at step 124 the first axis step size is 
derived from the width along the first axis 114 between 
windows which have a correlation to the template 108 equal 
to or less than the prescribed cut-off value. The step size 
along the first axis 114 is a fraction of the width. In a 
preferred embodiment, one -half the width is taken as the 
step size for the given axis. In other embodiments, the step 
size is taken as the entire width or some other fraction of the 
width. 

In steps 126 and 128 the correlations are repeated along 
the second axis 116 in two opposing directions to find a 
width along the second axis 116. For two dimensional 
analysis, there is a cut-off point found in each direction from 
the center window along the second axis 116. The distance 
between those two windows in data points is the width along 
the second axis. A fraction of this distance is taken as the 
step size for the corresponding axis (e.g., first axis, or 
horizontal, step size; second axis, or vertical, step size). In 
a preferred embodiment, one-half the width is taken as the 
step size. In other embodiments, the step size is taken as the 
entire width or some other fraction of the width. Preferably, 
the step size along the second axis 116 is derived in the same 
manner as the step size along the first axis 114. The step 
sizes are referred to herein as correlative auto-predictive 
search ('CAPS') step sizes. 
Fast Search: 

Once the CAPS step sizes have been derived, a fast search 
is performed comparing the template 108 to the search area 
110. It is a fast search in the sense that not every potential 
window of the search area is compared to the template. 
Referring to FIG. 13, the search area 110 is shown as an 
array of data points 74, 75 such as image pixels points. The 
two CAPS step sizes are used for selecting windows from 
the search area 110 to be compared to the template. The data 
points in the search area 110 about which the template is 
centered during successive steps are designated with an open 
circle and part number 75. Other data pints in the points 
which are not center points are designated as a data point 74. 

Referring to FIG. 14, at a step 136 the template 108 (see 
FIG. 11) is overlaid to a starting window 112 of the search 
area 110. The starting window can be any window of the 
search area. In a preferred embodiment the starting window 
112 is selected by predicting the object location with the 
motion vector, derived for the previous frame. In one 
embodiment a linear prediction calculation is implemented, 
although other more complex prediction algorithms also 
may be used. 

At step 138 a correlation is performed between the 
template 108 and the starting window and every +/-x-th 
window along the first axis 114, where x is the first axis step 
size. Thus, for a horizontal axis step size of *x', the template 
is shifted along the horizontal axis 114 by x data points at a 
time. More specifically, a center point 77 of the template 108 
coincides with a given pixel 75 for a given iteration. The 
template then is moved to center over another data point 74 
that is x points away from the given pixel 75 along the 
horizontal axis 114. The template 108 is moved in each 
direction along the axis 114 using the first step size of x. A 
correlation is performed at each step. 
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At step 140 the shifting along the first axis 114 and testing 
of windows is performed for a template center point repo- 
sitioned over every y-th row of data points. Specifically, 
once the initial row of the search area has been tested, the 
template 108 is moved along the second axis 116 to another 5 
row that is y data points away, where y is the second axis 
step size. This next row then is tested by shifting along the 
first axis 114 using the first axis step size. A correlation is 
performed at each iteration. Then another row is tested 
which is y data points away along the second axis 116. In 10 
this manner the template is shifted by the second step size 
along the second axis 116 and by the first step size along the 
first axis 114 to select windows to be tested during the fast 
search. For example, in a search area which is 400 pixels by 
400 pixels, and where the first axis step size is four and the is 
second axis step size is four, there are 100*100=10,000 
windows tested during the fast search. 

Of the tested windows, at step 142 the window location 
for any correlation which resulted in a correlation coefficient 
which is greater than or equal to the product of the cut value 20 
times a predetermined threshold value is considered a local 
match. In a preferred embodiment the cut value is the same 
for each axis. Where the cut value used along one axis differs 
from the cut value used along the other axis, either cut value 
may be used. Alternatively, an average of the cut values may 25 
be used. The threshold value is a predetermined value and 
signifies the minimum correlation coefficient acceptable to 
designate a window as being a match for the template. 
Typical values are 0.8 and 0.9. The specific value may vary 
based upon the search area or type of date. The specific value 30 
may be determined empirically for different types of data or 
search area characteristics. 
Local Full Search: 

Once the fast search is complete (or during the course of 
the fast search), a local full search is performed about each 35 
of the local matches. For a given window of the search area 
110 which is a local match, the windows which are within 
a 2-dimensional area bounded by the step sizes (for the 
respective axes) are tested by a local full search. Note that 
the windows which are exactly a step size away along either 40 
axis 114, 116 were already tested during the fast search. To 
do the local full search we test all the intermediary windows 
in the area between the local match and the windows plus or 
minus one step size away along either axis 114, 116. For 
example, given a first axis step size of x and a second axis 45 
step size of y, the windows having a center point which are 
+/-0, 1, 2, ... , x-1 data points away from the locally 
matched window along the first axis, and +/-0, 1, 2, ... , y-1 
data points away from the locally matched window along the 
second axis, are tested during the full search. Although, the 50 
local match need not be recorrelated. 

Referring to FIG. 15, the window corresponding to the 
local match has a center data point 146. The template is 
moved at a step interval of one data point in either direction 
along either axis up to but not including the data point which 55 
in one step size away. As the template is moved over this 
area, the windows tested during the local full search will 
have a center data point 148. FIG. 15 shows all the center 
points 148 for a given local full search as black dots for an 
implementation in which the first axis step size is four and 60 
the second axis step size is four. FIG. 15 shows the nearby 
center points from the fast search as open dots 75. 

A correlation is performed between the template 108 and 
each window in the vicinity of the local match. For the 
vicinity shown in FIG. 15 in which the step is four, there are 65 
48 additional windows tested. Any of the additional 48 
windows or the local match which has a correlation coeffi- 
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cient which equals or exceeds the threshold value is a match 
of the template. Alternatively, of the windows where the 
correlation coefficient exceeds the threshold value, only the 
window or windows having the highest correlation 
coefiScient(s) are selected as matched. For example, one or 
more windows may have the same correlation coefficient 
which is highest. As another example the windows corre- 
sponding to the top V correlation coefficients may be 
selected, where each window correlation coefficient also 
exceeds the threshold value. 

Once the template match is found, the corresponding 
window in the search area is the object being tracked. The 
relative position of the object within the search area 110 for 
the current image frame is compared to the relative position 
of the object in the search area for the prior image frame. The 
motion vector is derived/updated from the relative positions 
to define the movement of the object. In one embodiment, 
the vector is a linear vector derived from respective mid- 
points of the object from the two image frames. In another 
embodiment a more complex vector analysis is performed to 
identify rotation or other two-dimensional or three- 
dimensional motion of the object being tracked. 

In one embodiment the area of the image frame corre- 
sponding to the template match is output to the object 
segmentation subsystem 16, where the edge potential energy 
of the object boundary is derived. In addition, a set of data 
points along the periphery of the template match is sampled 
to serve as an estimate of the current image object boundary. 
Such estimate is input to the object segmentation subsystem 
18. 

Implementing the Correlation Function: 

The correlation coefficient for a correlation between two 
data sets 'a* and V is defined below. The data set V is the 
template 108. The data set V is a window of the padded 
template (or of a rotational offset of the padded template) for 
the process of finding the CAPS step sizes. The data set 'b' 
is a window of the search area 110 (or of a rotational offset 
of the search area) for the process of identifying candidate 
locations, potential template matches or template matches. 
Each of data sets V and *b' may be a matrix, image or 
another set of data points. The correlation coefficient, corr is: 

_ E{[a-E{a)]*[l>-E(b)]} 
COrr= sd(a)*sd(b) 

which may be simplified to 

E{a*b)-E(a)*E(b) 

COT? = ■ ■ 

sd(a)*sd{b) 

where E(x) -expected value of data set (x) 
sd(x)»standard deviation of data set (x) 
and corr is between -1.0 and +1.0. 

Edge Energy 

Referring to FIG. 2, edge energy is generated at steps 34 
and 50. More particularly, it is edge potential energy which 
is derived. Various measures of potential energy may be 
implemented. In one embodiment a multiple level wavelet 
detection algorithm is used to extract high frequency com- 
ponents of an image. The high frequency details are ana- 
lyzed to identify image object edges. In a preferred embodi- 
ment Haar wavelet detection is used. 

The input to be processed to derive edge potential energy 
is an image. In one embodiment the image is the entire 
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image frame. In other embodiments, the image is an image 
object (e.g., the template match area found by the tracking 
subsystem 14). The derived edge potential energy is an array 
of potential energy for each data point (pixel) of the image. 

The input image is decomposed by filtering the image 5 
with a quadrature mirror filter (QMF) pair which brings out 
the image details, while simultaneously smoothing the 
image. The QMF pair includes a high pass filter for bringing 
out the image details, and a low pass filter for smoothing the 
image. Referring to FIG. 16 a multiple level QMF decom- 10 
position 150 of an image frame 152 is shown. The image 
frame 152 is passed through a low pass filter 154 and a high 
pass filter 156 to obtain a low pass component 158 and a 
high pass component 160. These components, in turn, are 
filtered. The low pass component 158 is passed through a 15 
low pass filter 162 and a high pass filter 164. The output of 
low pass filter 162 is lowpass residue 166. The output of 
high pass filter 164 is the horizontal detail 165 of the image 
frame 152. 

In parallel, the high pass component 160 is passed through 2Q 
a low pass filter 168 and a high pass filter 170. The output 
of the low pass filter 168 is the vertical detail 169 of the 
image frame 152. The output of the high pass filter 170 is the 
diagonal detail 171 of the image frame 152. The low pass 
residue 166 and the three detailing images 165, 169, 171 are 
the first level QMF decomposition of the image frame 152. 25 
In some embodiments a second level QMF decomposition 
172 also is performed in which the low pass residue 166 is 
input similarly through two stages of low pass and high pass 
filters to achieve a second-level, low-pass residue and three 
detailing images (horizontal detail, vertical detail and diago- 30 
nal detail). In some embodiments the same filters may be 
used in the second level decomposition as were used in the 
first level decomposition, for example, the low pass residue 
166 is merely input to filters 154, 156 instead of the image 
frame 152. 35 

The high pass filtering function is a wavelet transforma- 
tion (ij)), while the low pass filtering function is a scaling 
function (<|>) corresponding with the wavelet. The scaling 
function causes smoothing, while the three wavelets bring 
out the image details. 40 

The scaling function and wavelet transforms in one 
dimensional space are given by the equations below: 

<M*)=-^f— Xa>bMR 

ya \ a J 45 



where, $ a b (x) is the family of scaling function at sale a 50 

and translated by b; 
ijj a b (x) is the family of wavelets at scale a and translated 

by b; 

a is the scaling factor; 55 
b is the translation desired 
<j) is <j> 0 0 ; and 

Two dimensional wavelets are defined as tensor products 
of the one-dimensional wavelets. The two-dimensional seal- 60 
ing function is <Kx,y)=<t>(x)*<Ky). Tne two-dimensional 
wavelets are: 
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Although the scaling may be varied from one level of 
decomposition to another, in one embodiment such scaling 
is not varied. 

A first level QMF decomposition is performed. For a 
second level decomposition the low pass residue 166 of the 
first level decomposition is analyzed without further down- 
sampling. In some embodiments additional levels of decom- 
position may be obtained by passing the low pass residue of 
the prior level through a two stage filtering process (similar 
to that for the prior levels). 

For any given level of decomposition there are four 
images: the low pass residue, the vertical detail, the hori- 
zontal detail and the diagonal detail. The horizontal and 
vertical detail are gradients of the image along x and y axes. 
The magnitude of the image is taken at every level of 
decomposition. The diagonal details have been omitted in 
one embodiment, because they did not contribute signifi- 
cantly. 

In a preferred embodiment up to five levels of decompo- 
sition are used for each color component of the image frame, 
in which the low pass residue from the prior stage is input 
to the filters 154, 156 to generate image details and residue 
for the current stage. Preferably, only data from the even 
levels (e.g., levels 2, 4, and 6) are used to avoid half-pixel 
shifts in the edge energy. The integration of the multiple 
levels and multiple channel (color component) data is 
guided by their principle component. In one implementation 
the ratio of multiple -level edge gradients is selected as 
1:2:4:8:16 for the five levels of decomposition. With respect 
to the color components (Y, Cr, Cb), edge gradient ratios of 
1:1:1 are used. 

In a preferred embodiment the horizontal detail and 
vertical detail of a given level (i) of decomposition are 
combined to generate the edge potential energy (EPE) for 
that level as follows: 

EPE (i>sqrt [horizontal detail 2 (i>vcrtical detail 2 (i)] 

where i=i-th level of decomposition. For an embodiment in 
which 5 levels of decomposition are executed, the total edge 
potential energy (EPE) for a given color component are 
summed together: 

£F£ c ^P£ c (2)+2*£P£ c (4)+4*£i 3 £ (: (6)+8*£/ J £ c (8)+16*^ c (10) 

where c is the color component being processed. The overall 
edge potential energy for the entire frame, inclusive of all 
color components is the weighted sum of the energy from 
the different color components. For a weighting factor of (1, 
1, 1) the total potential energy is given by: 

Total Edge Potential EncTgy=EPE y +EPE cr +EPE cb 

where Y, Cr and Cb are the color components. In other 
embodiments R,G and B color components or those of 
another color component model may be used. The weighting 
factor may vary depending on the color components model 
being used. 

The total edge potential energy is an array having an 
energy value for each pixel of the image processed. The edge 
potential energy is input to the active contour model for use 
in object segmentation. In some embodiments the edge 
energy is input to the CAPS process. When providing an 
input to the CAPS process, the edge energy is being used to 
predict where the object being tracked is located in a current 
image frame. For such an embodiment, the "Generate Edge 
Energy" step 50 is executed prior to the tracking step 48 (see 
FIG. 2). 

Note that in various embodiments, the edge potential 
energy is derived before or after the CAPS model executes. 
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When the edge potential energy is calculated first, a pre- For a given region 78, the candidate point is the pixel 

dieted location for the image object may be derived with the among the 6 potential points which has the highest edge 

edge potential energy as an input to the tracking subsystem potential energy. For an image object boundary which has N 

14. When the CAPS model executes first, the image being current ed e points> and where mere are M , four) 

processed for edge potential energy is the template matched 5 alteraative candidate points for each one of me N pointS) 

portion or the image frame. . ^ r m U1 t * f. . / 

r ° there are (M+lj^ (e.g., 5 ) possible contours from which to 

Object Segmentation select the modelled image object boundary. At step 202 a 

Once an image object has been identified, the image travel algorithm is applied to the current edge points with the 

boundary (i.e., edge) is segmented to more accurately model alternative candidate points to select an optimal contour 

the object edges. In one embodiment the object segmenta- path. FIG. 19 shows a travel path diagram for the possible 

tion subsystem 18 is based upon an active contour modelling contours. There are (M+l (e.g., 5) points in each column, 

method. However, other segmentation methods are known The five points correspond to a current edge point 176 and 

and may be substituted. The active contour modelling four candidate edge points 186, 189 for such current edge 

method (see subsystem 18) performs segmenta tion at step 52 point 176. The number of points in each row (which also 

(FIG. 2) to segment the image object boundary. equals the number of columns) corresponds to N. 

Input to the active contour model is the derived total edge Jo choQse ^ ima , { qW bound a sUrti 

potential energy and a current image object boundary^ The location 190 on the current contour is selected. Such location 

total edge potential energy is derived at step 50 (see FIG. 2). 19Q t0 any iven current ed inl 176 and its 

For an initial frame Jhe current image .object boundary is the 2Q M ^ ed ^ ^ m ^ ^ of such 

boundary input to the system at step 22 (see FIG. 2). The set M+1=5 . an ^ h fa deriyed Qf ^ 5 ^ 

of data points for the current image object boundary are used ^ the mQSt ^ fa tfaen ^ tQ be ^ 

by the active contour model at step 36 (see FIG. 2). modeUed object boundary . ^ process for deriving the 

For subsequent image frames, the current image object optima , patD ^ lhe same for each of the M+1 paths t0 be 

boundary is derived by the tracking subsystem 14, as 25 der j ved 
described above. The set of data points for the current image 

object boundary are used by the active contour model at step Referring to FIG. 20, consider a path that is to start from 

52 (see FIG. 2). edge point 176s. A segment of the path is constructed by 

Referring to FIG. 17, a flow chart 192 of the active advancing to one of the M+l points in the adjacent column 

contour model includes a first step 194 at which edge points 30 s+1 * Thus » one choice is to ste P t0 P oint 176(s+l). Another 

are received by the object segmentation subsystem 18. The choice fe to ste P t0 candidate point 186(s+l). The others 

number of input edge points may vary. At step 196, the edge choices include 187(s+l), 188(s+l) and 189(s+l). Only one 

points which are too close together are eliminated, (i.e., less choice is selected. The choice is made by determining for 

than a first threshold distance apart). In one embodiment whlch of the M+l-5 points in column (s+1) the resulting 

points are considered too close together when they are less 35 P ath has the least difference in energy (e.g., the most energy 

than 2.5 pixels apart. In other embodiments the distance may savings). The selected point is preserved along with a 

be smaller or larger. At step 198 additional points are added distance of how far such point is from the current point in 

by interpolation where the adjacent points are too far apart, column s+1. Consider an example where point 186(s+l) is 

(i.e., greater than a second threshold distance apart). In one selected. Such point is preserved along with a distance value 

embodiment points are considered too far apart together 40 < e f " in pixels ) of far many pkels Such point is from lhe 

when they are greater than 6.0 pixels apart. In other embodi- point 176(s+l). 

ments the distance may be smaller or larger than 6.0 while Similarly, to construct the next segment of the path a point 

being larger than the first threshold distance. among the M+l points in column s+2 is selected. For each 

At this stage of the process there are a given number of segment along the path only one of the M+l=5 potential 

current edge points, as modified from the input edge points. 45 segments are preserved, along with a distance from such 

Although the number of edge points may vary from contour point to the current point 176 in the same column, 
to contour, we will describe the process for N current edge 

points. At step 200 the subsystem 18 performs global The same process is performed to derive a path which 

relaxation on the N current edge points. To do so, for each starts from pomt 186s * A firet se S m ent of the path is 

current edge point, M candidate points are selected from a 50 constructed b y advancing to one of the M+l points in the 

box around the current edge point. In one embodiment M a ?J acent column S+L 0ne choice 15 t0 slep t0 P oim 176 ( s+ 

equals 4, although in various embodiments the number of ^ Another choice 15 t0 ste P t0 candidate point 186(s+l). 

candidate points may vary. In one embodiment a 5x5 box is The others choices include 187(s+l), 188(s+l) and 189(s+ 

used. However, the size of the box may vary. A larger box X } 0n }y. one , choice * selected. The choice is made by 

leads to a more flexible contour, but more computation time. 55 determmm S for whlch of the M+1 = 5 P omts ™ cohimn ( s+1 ) 

The shape of the box may be square, rectangular or another the ^suiting path has the most difference in energy relative 

snape to the current contour 173. The selected point is preserved 

Referring to FIG. 18, a 5x5 box 174 of pixels surrounding aloQ 8 , with f dist f re of , hc T far s ? ch P 0 *" 1 is from c ,he 
the current edge point 176 is divided into four regions 178, ™™\V°>n\ £ C °'T?,? 0 RespeCtlv 1 e paths starUn e ^ 
180, 182, 184. Within each region there are 6 pixels. One of 60 P 0,nt 187s - 188s ^ 189s. /especUvely are constructed .n 
those 6 pixels is selected in each region to be a candidate ^ ■f mc manner -™ e M + * "*ulttng paths then are corn- 
pixel ('point') which may potentially replace the current ^ 10 566 which one , ls . the mo f °P tunal P ath ' ° ost 
edge point 176 as an object boundary edge point. Thus, 4 dlfference ™ ™W tot he current contour 173). 
candidate points 186-189 are selected for each current edge According to this method, rather than perform 5^ 
point 176. In alternative embodiments a different number of 65 computations — one for each one of the potential contours — 
candidate points, or another method of selecting candidate only (M+1)*(M+1)*N) — (e.g., 5*(5*N)) — computations 
points, may be used. occur. 
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The energy difference between a contour which steps to 
the current point for a given point among the 5 potential 
points at a given step is derived as follows: 



A£; = £ 6Ei 



where, 



f{ul, u2. vl, v2) = TEPE* ds 

Jiut.u2) 
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where TEPE«total edge potential energy 
ds=derivative with respect to s (s=length of contour 

segment between two points) 
f,° represents the integral of the total edge potential energy 

along the i-th segment of the current contour; 
f/ represents the integral of the total edge potential energy 

along the i-th segment of the candidate contour; 
d ° represents the length of the i-th segment of the current 

contour; 4Q 
d, 1 represents the length of the i-th segment of the 

candidate contour; 
d, 2 represents the distance between the two segments 

when we look at them as vectors; 
d ( 3 represents the distance between the i-th current con- 45 

tour point and the i-th candidate contour point. 
The terms d° and d/ correspond to tension in the contour. 
The term d 2 corresponds to stiffness for keeping the shape 
of the modelled contour similar to the current contour. The 
term d/ 3 corresponds to pressure to keep the candidate 50 
contour close to the current contour. The optimal contour is 
the one having the optimal AE. In one embodiment this is the 
maximum AE. In other embodiments negative TEPE is used 
instead, so optimum becomes the minimum AE. 

At the completion of step 202, the optimal contour is a 55 
polygon. As a result, the points identified at step 202 selected 
from the travel algorithm, may or may not be on the actual 
smooth object boundary. Thus, fine tuning is performed at 
step 204. 

Each segment of the optimal contour includes the points 60 
selected using the travel algorithm as end points, along with 
the pixels in between. The pixels in between although not 
part of the travel problem are part of the input image being 
processed. In the fine tuning process the pixel along the 
segment having the highest edge potential energy is selected 65 
as the most reliable point of such group for being on the 
actual object boundary. A most reliable point is selected for 



each segment of the polygon (i.e., optimal contour path 
output from the travel algorithm). Points then are selected to 
be filled in between the most reliable points using the 
criteria: (i) a new point should be 8 connected to a previ- 
ously selected boundary point, and (ii) the distance of the 
new boundary point to the next boundary point should be 
less than the distance from the previous boundary point to 
the next boundary point. 

Once the object boundary has been fine tuned, the active 
contour process is repeated with the object boundary of the 
prior iteration being the current edge points. Global relax- 
ation then is performed again at step 200 to select alternative 
candidate points for the current edge points. Then the travel 
algorithm is reapplied at step 202, followed by fine tuning at 
step 204. After the fine tuning step, at step 206 an iteration 
count is tested to determine if a maximum number of 
iterations have been executed. If a maximum number of 
iterations has occurred, then the edge points making up the 
fine tuned boundary are the image object boundary points 
output at step 38 (see FIG. 2). If not, then at step 208 the 
contour is checked to see if it has changed from the prior 
iteration. If it has not changed then the edge points making 
up the fine tuned boundary are the image object boundary 
points. If the contour has changed, then the process is 
repeated commencing at step 200 with the global relaxation 
process. 

Exemplary implementations of the object segmentation 
methods include, but are not limited to video encoding, 
video editing and computer vision. For example, the seg- 
mentation and modelling may be performed in the context of 
MPEG-4 video encoding and content based video editing in 
which video objects from different video clips are grouped 
together to form a new video sequence. As an example of a 
computer vision application, the segmentation and model- 
ling may be performed with limited user assistance to track 
a target (e.g., military or surveillance context). Once the 
target is locked with user assistance, the tracking and 
segmentation methods automatically provide information to 
follow the target. 

Encoder Subsystem 

When other processing is complete, the encoder sub- 
system 19 is activated to encode and compress the finalized 
image frame or video sequence into a desired format. In one 
embodiment a MPEG-4 encoder is implemented. 

In one embodiment the operator is able to analyze the 
output quality by viewing peak signal to noise ratios per 
color component or per number of bit encoding. In addition, 
the operator can alter some encoding parameters and view 
the results for many different encodings to find the encoding 
settings that provide the desired trade-off to achieve a 
satisfactory image quality at some number of bits encoded 
per pixel. By segmenting the object image the operator is 
able to provide more bits for encoding the segmented image 
object(s) then for the other portions of the image frame(s). 
Thus, increased precision is achieved for the image object(s) 
of interest. 

Meritorious and Advantageous Effects 

An advantage of the invention is that image segmentation 
techniques are performed in HSI color space where color 
sensing properties more closely resemble human vision. 
According to another advantage of this invention, object 
boundaries are preserved while noise level is significantly 
reduced and the noise variance is made more uniform. 

Although preferred embodiments of the invention have 
been illustrated and described, various alternatives, modifi- 
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cations and equivalents may be used. Therefore, the fore- 
going description should not be taken as limiting the scope 
of the inventions which are defined by the appended claims. 
What is claimed is: 

1. A method for segmenting an image frame of pixel data, 
the image frame including a plurality of pixels, the pixel data 
corresponding to the pixels, the method comprising: 

for each pixel of the image frame, converting the corre- 
sponding pixel data into hue, saturation, intensity color 
space to achieve HSI pixel data having a hue 
component, a saturation component and an intensity 
component; 

filtering the HSI pixel data to achieve filtered HSI pixel 
data, wherein said filtering includes: respectively 
selecting for each one HSI pixel of the image frame, 
based upon a value of the corresponding intensity 
component of said each one HSI pixel of the image 
frame, a first filter kernel from a plurality of filter 
kernels; and respectively filtering the saturation com- 
ponent of each one HSI pixel using the first filter kernel 
selected for said one HSI pixel; 

identifying presence of an image object in the image 
frame; and 

segmenting the image frame to define a set of filtered HSI 
pixel data corresponding to the image object. 

2. The method of claim 1, further comprising the step of: 
encoding the image frame, wherein the pixel data corre- 
sponding to the image object is encoded at a higher bit 
rate than other pixel data corresponding to another 
portion of the image frame. 

3. The method of claim 1, further comprising the step of 
performing a color gradient operation on the filtered HSI 
pixel data using a derivative of Gausssian operator; and 
wherein the step of segmenting comprises segmenting the 
image frame after the color gradient operation is performed, 
wherein the set of filtered HSI pixel data corresponding to 
the image object is filtered HSI pixel data which has 
received the color gradient operation. 
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identifying a second set of N filtered HSI pixels corre- 
sponding to an initial estimate of a desired contour of 
the image object, wherein said second set define a 
current object contour, the first set including at least the 
second set of HSI filtered pixels; 

deriving edge potential energy for the first set of filtered 
HSI pixels; 

refining the current object contour into the desired contour 
using the current object contour and the derived edge 
potential energy. 

9. A system for segmenting an image frame of pixel data, 
the image frame including a plurality of pixels, the pixel data 
corresponding to the pixels, the system comprising: 

a processor which converts, for each pixel of the image 
frame, the corresponding pixel data into hue, 
saturation, intensity color space to achieve HSI pixel 
data having a hue component, a saturation component 
and an intensity component; 

a selector which respectively selects for each one HSI 
pixel of the image frame, based upon a value of the 
corresponding intensity component of said each one 
HSI pixel of the image frame, a first filter kernel from 
a plurality of filter kernels; 

a filter receiving the HSI pixel data which generates 
filtered HSI pixel data, the filter including a saturation 
component filter which respectively filters the satura- 
tion component of each one HSI pixel using the first 
filter kernel selected for said one HSI pixel; 

a processor which identifies presence of an image object 
in the image frame; 

a processor which segments the image frame to define a 
set of filtered HSI pixel data corresponding to the 
image object. 

10. The system of claim 9, further comprising: 

an encoder which encoding the image frame, wherein the 
pixel data corresponding to the image object is encoded 
at a higher bit rate than other pixel data corresponding 
to another portion of the image frame. 

11. The system of claim 9, further comprising a processor 
which performs a color gradient operation on the filtered 



4. The method of claim 1 for segmenting a plurality of 40 HSI pixel data using a derivative of Gausssian operator; and 



image frames included within a motion video sequence of 
image frames, wherein the steps of converting, filtering, 
identifying, segmenting and encoding are performed on each 
one image frame among the plurality of image frames. 

5. The method of claim 1, wherein the step of filtering the 
HSI pixel data comprises: 

applying an averaging filter having a kernel size adapted 
for each pixel, the averaging filter for increasing uni- 
formity of noise distribution of the pixel data within 
hue, saturation, intensity color space. 

6. The method of claim 1, in which said HSI pixel data 
filtering further comprises: 

respectively selecting for each one HSI pixel of the image 
frame, based upon a product of the intensity component 
and the saturation component of said each one HSI 
pixel of the image frame, a second filter kernel from the 
plurality of filter kernels; and 

respectively filtering the hue component of said each one 
HSI pixel using the second filter kernel selected for said 
each one HSI pixel, 

7. The method of claim 1, in which the step of segmenting 
comprises applying an active contour model to define an 
edge of the image object. 

8. The method of claim 1, in which the step of identifying 
the image object comprises identifying a first set of filtered 
HSI pixels corresponding to the image object, and in which 
the step of segmenting comprises: 
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wherein the segmented image frame is segmented after the 
color gradient operation is performed and the set of filtered 
HSI pixel data corresponding to the image object is filtered 
HSI pixel data which has received the color gradient opera- 
tion. 

12. The system of claim 9 in which a plurality of image 
frames included within a motion video sequence of image 
frames are segmented and encoded, and further comprising 
a processor which tracks the image object among the plu- 

50 rality of image frames. 

13. The system of claim 9, wherein the HSI pixel filter 
comprises: 

an averaging filter having a plurality of kernels of differ- 
ing kernel size, wherein the kernel size is adapted for 
55 each pixel to increase uniformity of noise distribution 
within hue, saturation, intensity color space. 

14. The system of claim 9, in which the selector is a first 
selector and further comprising a second selector which 
respectively selects for each one HSI pixel of the image 

60 frame, based upon a product of the intensity component and 
the saturation component of said each one HSI pixel of the 
image frame, a second filter kernel from the plurality of filter 
kernels; and wherein said HSI pixel data filter further 
includes: 

a hue component filter which respectively filters the hue 
component of said each one HSI pixel using the second 
filter kernel selected for said each one HSI pixel. 
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15. The system of claim 9, in which the processor which 
segments applies an active contour model to define an edge 
of the image object. 

16. The system of claim 9, in which the processor which 
identifies the image object identifies a first set of filtered HSI 
pixels corresponding to the image object, and in which the 
processor which segments comprises: 

means for identifying a second set of N filtered HSI pixels 
corresponding to an initial estimate of a desired contour 
of the image object, wherein said second set defines a 
current object contour, the first set including at least the 
second set of HSI filtered pixels; 

means for deriving edge potential energy for the first set 
of filtered HSI pixels; 

means for refining the current object contour into the 
desired contour using the current object contour and the 
derived edge potential energy. 

17. A computer readable storage medium for storing 
processor-executable instructions and processor- accessible 
data for segmenting an image frame of pixel data, the image 
frame including a plurality of pixels, the pixel data corre- 
sponding to the pixels, the medium comprising: 

means which converts, for each pixel of the image frame, 
the corresponding pixel data into hue, saturation, inten- 
sity color space to achieve HSI pixel data having a hue 
component, a saturation component and an intensity 
component; 

means for filtering the HSI pixel data to generate filtered 
HSI pixel data; wherein said filtering means includes: 
means respectively selecting for each one HSI pixel of 
the image frame, based upon a value of the correspond- 
ing intensity component of said each one HSI pixel of 
the image frame, a first filter kernel from a plurality of 
filter kernels; and means for respectively filtering the 
saturation component of each one HSI pixel using the 
first filter kernel selected for said one HSI pixel; and 

means for identifying presence of an image object in the 
image frame. 

18. The medium of claim 17, further comprising: 
means for segmenting the image frame to define a set of 

filtered HSI pixel data corresponding to the image 
object. 

19. The storage medium of claim 17, further comprising: 
means for encoding the image frame, wherein the pixel 

data corresponding to the image object is encoded at a 
higher bit rate than other pixel data corresponding to 
another portion of the image frame. 

20. The storage medium of claim 17, further comprising 
means which performs a color gradient operation on the 
filtered HSI pixel data using a derivative of Gaussian 
operator; and wherein the segmented image frame is seg- 
mented after the color gradient operation is performed and 
the set of filtered HSI pixel data corresponding to the image 
object is filtered HSI pixel data which has received the color 
gradient operation. 

21. The storage medium of claim 17, wherein the HSI 
pixel filtering means comprises: 

a plurality of kernels of differing kernel size, wherein the 
kernel size is adapted for each pixel to increase uni- 
formity of noise distribution within hue, saturation, 
intensity color space. 

22. The storage medium of 17, in which the selecting 
means is a first selecting means and wherein the HSI pixel 
filtering means further comprises: 

a second selecting means which respectively selects for 
each one HSI pixel of the image frame, based upon a 
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product of the intensity component and the saturation 
component of said each one HSI pixel of the image 
frame, a second filter kernel from the plurality of filter 
kernels; and 

a hue component filter which respectively filters the hue 
component of said each one HSI pixel using the second 
filter kernel selected for said each one HSI pixel. 

23. A method for filtering an image portion, the image 
portion comprising a plurality of pixel data, the method 
comprising the steps of: 

converting the plurality of pixel data into hue, saturation, 
intensity color space to achieve HSI pixel data having 
a hue component, a saturation component and an inten- 
sity component; 

respectively selecting and applying for each one pixel of 
the HSI pixel data, a first filter kernel from a plurality 
of filter kernels, said first kernel filtering the saturation 
component of said each one pixel of the HSI pixel data 
to achieve a filtered saturation component of the HSI 
pixel data; and 

respectively selecting and applying for each one pixel of 
the HSI pixel data, a second filter kernel from the 
plurality of filter kernels, said second kernel filtering 
the hue component of said each one pixel of the HSI 
pixel; 

wherein said selecting the first filter kernel to filter the 
saturation component comprises testing the intensity 
component of the corresponding HSI pixel data against 
a set of threshold values to determine which filter 
kernel among the plurality of filter kernels is applied to 
filter the saturation component. 

24. The method of claim 23, in which the step of selecting 
the second filter kernel to filter the hue component com- 
prises testing a product of the intensity component and the 
filtered saturation component of the corresponding HSI pixel 
data against a set of threshold values to determine which 
filter kernel among the plurality of filter kernels is applied to 
filter the hue component. 

25. The method of claim 23, in which the image portion 
is an image frame among a sequence of image frames, the 
method further comprising the steps of: 

identifying presence of an image object in the image 
frame; 

segmenting the image frame to define a set of filtered HSI 
pixel data corresponding to the image object. 

26. The method of claim 25, further comprising the step 
of performing a color gradient operation on the filtered HSI 
pixel data using a derivative of Gaussian operator; and 
wherein the step of segmenting comprises segmenting the 
image frame after the color gradient operation is performed, 
wherein the set of filtered HSI pixel data corresponding to 
the image object is filtered HSI pixel data which has 
received the color gradient operation. 

27. A system for filtering an image portion, the image 
portion comprising a plurality of pixel data, the system 
comprising: 

a processor which converts the plurality of pixel data into 
hue, saturation, intensity color space to achieve HSI 
pixel data having a hue component, a saturation com- 
ponent and an intensity component; and 

an averaging filter having a kernel size adapted for each 
pixel, the averaging filter increasing uniformity of 
noise distribution of the HSI pixel data, the averaging 
filter comprising: 

means for respectively selecting and applying for each 
one pixel of the HSI pixel data, a first filter kernel 
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from a plurality of filter kernels, said first kernel 
filtering the saturation component of said each one 
pixel of the HSI pixel data to achieve a filtered 
saturation component of the HSI pixel data; and 

means for respectively selecting and applying for each 
one pixel of the HSI pixel data, a second filter kernel 
from the plurality of filter kernels, said second kernel 
filtering the hue component of said each one pixel of 
the HSI pixel; 

wherein the means for selecting the first filter kernel to 
filter the saturation component comprises means for 
testing the intensity component of the corresponding 
HSI pixel data against a set of threshold values to 
determine which filter kernel among the plurality of 
filter kernels is applied to filter the saturation com- 
ponent. 

28. The system of claim 27, in which the means for 
selecting the second filter kernel to filter the hue component 
comprises means for testing a product of the intensity 
component and the filtered saturation component of the 
corresponding HSI pixel data against a set of threshold 
values to determine which filter kernel among the plurality 
of filter kernels is applied to filter the hue component. 

29. The system of claim 27, in which the image portion is 
an image frame among a sequence of image frames, the 
system further comprising: 

means for identifying presence of an image object in the 

image frame; and 
means for segmenting the image frame to define a set of 

filtered HSI pixel data corresponding to the image 

object. 

30. The system of claim 29, further comprising: means for 
performing a color gradient operation on the filtered HSI 
pixel data using a derivative of Gaussian operator; and 
wherein the segmenting means comprises means for seg- 
menting the image frame after the color gradient operation 
is performed, wherein the set of filtered HSI pixel data 



corresponding to the image object is filtered HSI pixel data 
which has received the color gradient operation. 

31. A computer readable storage medium for storing 
processor-executable instructions and processor-accessible 

5 data for filtering an image portion, the image portion com- 
prising a plurality of pixel data, the medium comprising: 
means for converting the plurality of pixel data into hue, 
saturation, intensity color space to achieve HSI pixel 
data having a hue component, a saturation component 
30 and an intensity component; 

means for respectively selecting and applying for each 
one pixel of the HSI pixel data, a first filter kernel from 
a plurality of filter kernels, said first kernel filtering the 
saturation component of said each one pixel of the HSI 
35 pixel data to achieve a filtered saturation component of 
the HSI pixel data; and 
means for respectively selecting and applying for each 
one pixel of the HSI pixel data, a second filter kernel 
from the plurality of filter kernels, said second kernel 
filtering the hue component of said each one pixel of 
the HSI pixel; 

wherein the means for selecting the first filter kernel to 
filter the saturation component comprises means for 
testing the intensity component of the corresponding 
HSI pixel data against a set of threshold values to 
determine which filter kernel among the plurality of 
filter kernels is applied to filter the saturation compo- 
nent. 

32. The medium of claim 31, in which the means for 
selecting the second filter kernel to filter the hue component 
comprises means for testing a product of the intensity 
component and the filtered saturation component of the 
corresponding HSI pixel data against a set of threshold 
values to determine which filter kernel among the plurality 

35 of filter kernels is applied to filter the hue component. 
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